Text Mining for Documents Annotation and Ontology Support

نویسندگان

  • Jan Paralic
  • Peter Bednar
چکیده

This paper presents a survey of basic concepts in the area of text data mining and some of the methods used in order to elicit useful knowledge from collections of textual data. Three different text data mining techniques (clustering/visualisation, association rules and classification models) are analysed and its exploitation possibilities within the Webocracy project are showed. Clustering and association rules discovery are well suited as supporting tools for ontology management. Classification models are used for automatic documents annotation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Annotation for the Semantic Web

Establishing the semantic web on a large scale implies the widespread annotation of web documents with ontology-based knowledge markup. For this purpose, tools have been developed that allow for semi-automatic annotation of web documents with ontology-based metadata. However, given that a large number of web documents consist either fully or at least partially of free text, language technology ...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Document indexing for automatic semantic annotation support

Nowadays, capturing the knowledge in ontological structures is one of the primary focuses of the knowledge management research. To exploit the knowledge from the vast quantity of existing unstructured texts available in natural languages in ontologies, tools for automatic semantic annotation (ASA) are heavily needed. In this paper, we present an approach to ASA and a method for documents conten...

متن کامل

Biomedical Document Triage Based on Figure Classification

The annotation task in model organism databases is to assign attributes, such as Gene Ontology (GO) codes, to biological entities, such as genes and proteins based on the evidence found in documents or other resources. Document triage precedes an annotation task; it identifies relevant documents that can support the annotation process. Annotation in organism databases involves manual efforts of...

متن کامل

Using Term-Matching Algorithms for the Annotation of Geo-services

This paper presents an approach for automating semantic annotation within service-oriented architectures that provide interfaces to databases of spatial-information objects. The automation of the annotation process facilitates the transition from the current state-of-the-art architectures towards semantically-enabled architectures. We see the annotation process as the task of matching an arbitr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003